Alleviating the Class Imbalance problem in Data Mining
نویسندگان
چکیده
The class imbalance problem in two-class data sets is one of the most important problems. When examples of one class in a training data set vastly outnumber examples of the other class, standard machine learning algorithms tend to be overwhelmed by the majority class and ignore the minority class. There are several algorithms to alleviate the problem of class imbalance in literature. In this paper the existing RUSBoost, EasyEnsemble and BalanceCascade algorithms have been compared with each other using different classifiers like C4.5, SVM, and KNN as the base learners. Several experiments have been done in order to find the best base learner and the algorithm which has the best performance according to the class distribution.
منابع مشابه
Extracting Predictor Variables to Construct Breast Cancer Survivability Model with Class Imbalance Problem
Application of data mining methods as a decision support system has a great benefit to predict survival of new patients. It also has a great potential for health researchers to investigate the relationship between risk factors and cancer survival. But due to the imbalanced nature of datasets associated with breast cancer survival, the accuracy of survival prognosis models is a challenging issue...
متن کاملClass Imbalance Problem in Data Mining using Probabilistic Approach
Class imbalance problem are raised when one class having maximum number of examples than other classes. The classical classifiers of balance datasets cannot deal with the class imbalance problem because they pay more attention to the majority class. The main drawback associated with it majority class is loss of important information. The Class imbalance problem is a difficult due to the amount ...
متن کاملA Review of Class Imbalance Problem
Class imbalance is one of the challenges of machine learning and data mining fields. Imbalance data sets degrades the performance of data mining and machine learning techniques as the overall accuracy and decision making be biased to the majority class, which lead to misclassifying the minority class samples or furthermore treated them as noise. This paper proposes a general survey for class im...
متن کاملClass Imbalance Problem in Data Mining Review
In last few years there are major changes and evolution has been done on classification of data. As the application area of technology is increases the size of data also increases. Classification of data becomes difficult because of unbounded size and imbalance nature of data. Class imbalance problem become greatest issue in data mining. Imbalance problem occur where one of the two classes havi...
متن کاملImprovement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination
Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...
متن کامل